Create a data visualisation showing average rating and proportion of cocoa percent (% chocolate) greater than or equal to 70% by top 15 company location.
Create a data visualisation showing average rating and proportion of cocoa percent (% chocolate) greater than or equal to 70% by top 15 company location.
To address the requirements of the task, chocolate.csv data set was used. The DT package was installed to display an interactive datatable to augment the graph. while the crosstalk package was installed to link multiple HTML widgets (e.g. a graph and a datatable) within RMarkdown.
The data preparation plan was as follows:
Select the 3 columns of interest, company_location, rating and cocoa_percent from the datatable.
Group data by company location, creating a new summary table of frequency count, average rating score and average cocoa percentage for each company location.
Filter out all company locations whose average cocoa percentage is less than 70%.
Slice out the top 15 locations by count.
Code chunk:
choc <- read_csv("data/chocolate.csv")
# Drop the % symbol in cocoa percent column and convert data type to numeric
choc$cocoa_percent<-gsub("%","",as.character(choc$cocoa_percent)) %>%
as.numeric(choc$cocoa_percent)
# Select only 3 relevent columns and create new datatable grouped by company locations.
# Keep only company locations with cococa% 70 and above
choc_loc <- choc %>%
select(`company_location`,`rating`,`cocoa_percent`) %>%
group_by(`company_location`) %>%
summarise (n=n(), avgR=mean(`rating`), avgPct=mean(`cocoa_percent`)) %>%
filter(`avgPct`>= 70) %>%
ungroup()
# Find top 15 company locations by number
top15_n <- choc_loc %>% slice_max(`n`, n = 15)
# Format data by stipulating no. of decimal places behind rating and cocoa percent values
top15_n$avgR <- round(top15_n$avgR, digits = 2)
top15_n$avgPct <- round(top15_n$avgPct, digits = 1)
A sketch of the proposed visualisation is shown below.
The graph on the left will show the top 15 company locations in descending order. There will be an interactive tooltip which will display the Rating and Cocoa Percentage of the selected company location when the cursor is hovered over it.
The interactive datatable will allow users to view the list company locations in order of ratings or cocoa percentage by clicking on the sort button at the top of each column.
As the two components will be linked, selecting any row in the table (e.g. the row with the highest rating or lowest cocoa percentage) will highlight the corresponding bar on the left to show the relative position of the company location.
To create the graph described above, ggplotly was used with the following customisations:
The data table was formatted to display only the Company Location, Rating and Cocoa Percentage and it was linked to the bar chart using the crosstalk method.
The code chunk is as follows:
# Wrap data frame in SharedData
shared_choc = SharedData$new(top15_n)
# Render graph
bscols( widths = c(7,5),
ggplotly((ggplot(shared_choc, aes(x=reorder(company_location,n), y=n, text = paste("Company Location: ", `company_location`,"<br>No. of Locations: ", `n`,"<br>Average Rating: ", `avgR`, "<br>Average Cocoa Percentage: ",`avgPct`,"%"))) +
geom_bar(stat="identity", fill="saddlebrown") +
coord_flip()+
xlab("Company Location") +
ylab("No. of Locations") +
ggtitle("Chocolate Ratings by Top 15 Company Locations") +
theme_minimal() +
theme(plot.title=element_text(size=9))),
tooltip = "text"),
DT::datatable(shared_choc, rownames = FALSE, options = list(pageLength = 5, scrollX=T, columnDefs = list(list(visible = FALSE, targets = c(1)))),colnames = c("Company Location", "Rating","Cocoa %"))
)
U.S.A. was the top company location with 1136 locations, followed far behind by Canada (177) and France (176).
However, the best chocolate rating was from Australia at 3.36, while the chocolate with the highest cocoa percentage was from the UK at 73.8%.
The only Asian country in the top 15 list was Japan, with 31 locations, an average rating of 3.13 and cocoa percentage of 71%.
The lowest chocolate rating came from Ecuador at 3.04, while the lowest cocoa percentages came from Venezuela and Denmark, tied at 70%.
Interestingly the locations with the highest chocolate ratings, i.e. Australia, Denmark, Switzerland, Canada, all had cocoa percentage that were relatively low (70-71.9%).
One unexpected challenge encountered in this exercise was that the use of crosstalk would disrupt the default CSS framework of distill. This resulted in the usual text formatting and sizing going haywire.
Upon further research, it was due to a Bootstrap HTML dependency attached to filter_select(), filter_checkbox(), and bscols(). This caused crosstalk to degrade the overall look when used in a non-Bootstrap CSS framework like distill.
RStudio developed a newer version of crosstalk in 2021 and the issue was resolved by installing the latest version of the package
Hands-on Exercise 3: Programming Interactive Data Visualisation with R
Plotly in R: How to make ggplot2 charts interactive with ggplotly